Agrupamiento de textos cortos en dominios cruzados
نویسندگان
چکیده
Nowadays, social networks have become an ideal tool for sharing information in real time. Different type of users use social media to comment about their activities, opinions, personal views, etc. The information poured in this media has become of particular interest to online reputation analysts, for instance, to identify relevant tendencies. 133 Research in Computing Science 115 (2016) pp. 133–145; rec. 2016-04-18; acc. 2016-05-02 However, the analysis of great amount of data is a very tedious task for a human. There are classification techniques that present alternative solutions for this problem, but given the dynamism of these social networks, having a model for each trend is not feasible since every day are emerging new trends, and even worse, new trends in new domains. In this paper, we present an unsupervised method for short texts categorization. Our experimental results show that our proposed method allows a robust text representation that performs well in cross-domains problems.
منابع مشابه
Clustering Iterativo de Textos Cortos con Representaciones basadas en Conceptos
Resumen La tendencia actual a trabajar con documentos cortos (blogs, mensajes de textos, y otros), ha generado un interés creciente en las técnicas de procesamiento automáticas de documentos con estas caracteŕısticas. En este contexto, el “clustering” (agrupamiento) de textos cortos es un área muy importante de investigación, que puede jugar un rol fundamental en organizar estos grandes volúmen...
متن کاملA Particle Swarm Optimizer to Cluster Parallel Spanish-English Short-text Corpora Un Optimizador basado en Cúmulo de Part́ıculas para el Agrupamiento de Textos Cortos de Colecciones Paralelas en Español-Inglés
Short-texts clustering is currently an important research area because of its applicability to web information retrieval, text summarization and text mining. These texts are often available in different languages and parallel multilingual corpora. Some previous works have demonstrated the effectiveness of a discrete Particle Swarm Optimizer algorithm, named CLUDIPSO, for clustering monolingual ...
متن کاملUso del punto de transición en la selección de términos índice para agrupamiento de textos cortos
Nowadays a wide variety of clustering methods exist. The critical decision of what keywords will be used in the representation of the collection is considered in those methods. In this paper we deal with the problem of clustering a set of short texts from an specific domain. Thus, the problem become to be more complex because of the small number of terms that can be used in term selection proce...
متن کاملDetección del lenguaje figurativo e ironía en textos cortos
Resumen En la presente investigación se propone un modelo para resolver la Tarea 11 de la Competencia Semeval 2015. El modelo propuesto utiliza caracteŕısticas léxicas extráıdas de los textos, asi como la polaridad de las palabras obtenidas utilizando diferentes herramientas. El modelo fue validado con 1 corpus de Twitter y se compara el desempeño de dos de los algoŕıtmos más usados para clasif...
متن کاملDensity-based clustering of short-text corpora∗ Agupamiento de textos cortos basado en densidad
In this work, we analyse the performance of different density-based algorithms on short-text and narrow domain short-text corpora. We attempt to determine to what extent the features of this kind of corpora impact on the density computation of the clusterings obtained and how robust these algorithms to the different complexity levels are.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Research in Computing Science
دوره 115 شماره
صفحات -
تاریخ انتشار 2016